Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

نویسندگان

  • Chun-Tien Chang
  • Chi-Neu Tsai
  • Chuan Yi Tang
  • Chun-Houh Chen
  • Jang-Hau Lian
  • Chi-Yu Hu
  • Chia-Lung Tsai
  • Angel Chao
  • Chyong-Huey Lai
  • Tzu-Hao Wang
  • Yun-Shien Lee
چکیده

The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Base-Calling Algorithm with Vocabulary (BCV) Method for Analyzing Population Sequencing Chromatograms

Sanger sequencing is a common method of reading DNA sequences. It is less expensive than high-throughput methods, and it is appropriate for numerous applications including molecular diagnostics. However, sequencing mixtures of similar DNA of pathogens with this method is challenging. This is important because most clinical samples contain such mixtures, rather than pure single strains. The trad...

متن کامل

Decoding of Superimposed Traces Produced by Direct Sequencing of Heterozygous Indels

Direct Sanger sequencing of a diploid template containing a heterozygous insertion or deletion results in a difficult-to-interpret mixed trace formed by two allelic traces superimposed onto each other. Existing computational methods for deconvolution of such traces require knowledge of a reference sequence or the availability of both direct and reverse mixed sequences of the same template. We d...

متن کامل

Detection of New Silent Mutation at 348 bp Position in a CD18 Gene in Holstein Cattle Normal and Heterozygous for Bovine Leukocyte Adhesion Deficiency Syndrome

In India, Holstein and its crosses are being used extensively in breeding programmes and all these breeding bulls are screened for autosomal recessive genes. Blood samples are collected in ethylenediaminetetraacetic acid (EDTA) coated tubes and DNA was isolated by using phenol-chloroform method. Polymerase chain reaction restriction fragment length polymorphism (PCR-RFLP) wereperformed by using...

متن کامل

PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing.

Fluorescence-based sequencing is playing an increasingly important role in efforts to identify DNA polymorphisms and mutations of biological and medical interest. The application of this technology in generating the reference sequence of simple and complex genomes is also driving the development of new computer programs to automate base calling (Phred), sequence assembly (Phrap) and sequence as...

متن کامل

Aligning Flowgrams to DNA Sequences

A read from 454 or Ion Torrent sequencers is natively represented as a flowgram, which is a sequence of pairs of a nucleotide and its (fractional) intensity. Recent work has focused on improving the accuracy of base calling (conversion of flowgrams to DNA sequences) in order to facilitate read mapping and downstream analysis of sequence variants. However, base calling always incurs a loss of in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 2012  شماره 

صفحات  -

تاریخ انتشار 2012